library(readxl)
avg_price <- read_excel("/Users/ali/Documents/GitHub/asm-time-series-arimax/data/UK-house-price-index.xlsx", sheet = 3)
source("validation.R")
london_price<-avg_price[2:dim(avg_price)[1],c("...1","LONDON")] # select data
london_price$LONDON <- as.numeric(london_price$LONDON) # transform into numerical data
serie <- ts(london_price$LONDON, start = c(1995, 1), freq = 12)
plot(serie,main="London housing prices")
abline(v=1995:2024,col=4,lty=3)
We shall diagnose if the series is stationary. And if it is not, try to transform the series into a stationary one.
To diagnose the non-constant variance, we will check 2 plots:
boxplot(serie~floor(time(serie)), xlab='time (years)')
mserie <- matrix(serie, ncol = 12, byrow = TRUE)
m <-apply(mserie, 2, mean) # mean of each column (year)
v <-apply(mserie, 2, var)
plot(v ~ m, xlab='mean', ylab='variance')
abline(lm(v ~ m), col=2, lty=3)
text(m,v,1995:2024)
The first plot we can observe that the variance is not consistent for all years. There are bigger and smaller boxes during the years. In the second plot, we can see that for higher values of the mean we have higher values of the variance. Therefore we shall consider transforming the scale. We can check by the BoxCox method if the logarithm would be a good option.
library(forecast)
BoxCox.lambda(serie+1)
## [1] 0.5087956
We have that lambda is close to 0, therefore a logarithmic transformation seems reasonable.
lnserie=log(serie)
plot(lnserie)
And check the same plots whether it improved or not.
#boxplot
boxplot(lnserie~floor(time(lnserie)), xlab='time (years)')
#mean/var plot
mserie <- matrix(lnserie, ncol = 12, byrow = TRUE)
m <-apply(mserie, 2, mean) # mean of each column (year)
v <-apply(mserie, 2, var)
plot(v ~ m, xlab='mean', ylab='variance')
abline(lm(v ~ m), col=2, lty=3)
text(m,v,1995:2024)
# comparison boxplot between series
par(mfrow=c(1,2))
boxplot(serie~floor(time(lnserie)), xlab='time (years)')
boxplot(lnserie~floor(time(lnserie)), xlab='time (years)')
The boxes seem to became smaller. #### Is there a Seasonal Pattern?
monthplot(lnserie)
There is no variations across the months, therefore seems to not have a seasonal pattern.
ts.plot(matrix(lnserie,nrow=12),col=1:8)
Does not seem to have seasonal monthly patterns. We will plot the decomposition of additive time series to see more insights.
decomposed <- decompose(lnserie,type = "additive")
plot(decomposed)
We can see that there is a seasonal patter, later on we will explore
more.
There is kind of an almost-linear trend. We will apply one regular difference trying to make the mean constant:
d1lnserie=diff(lnserie)
plot(d1lnserie)
#abline(h=mean(lnserie))
abline(h=mean(d1lnserie),col=2)
The mean from the lnserie is not constant.
mean(lnserie)
## [1] 12.4715
And the mean from d1lnserie is close to 0:
mean(d1lnserie)
## [1] 0.005490386
We shall check for over-differentiation: Take an extra differentiation and compare the variances.
d1d1lnserie=diff(d1lnserie)
plot(d1d1lnserie)
abline(h=0)
abline(h=mean(d1d1lnserie),col=2)
var(lnserie)
## [1] 0.357134
var(d1lnserie)
## [1] 0.0001493162
var(d1d1lnserie)
## [1] 0.0002410005
mean(lnserie)
## [1] 12.4715
mean(d1lnserie)
## [1] 0.005490386
mean(d1d1lnserie)
## [1] 4.04042e-05
Although the mean is closer to 0, the variance increase when we take an extra regular difference. Therefore we do not need an extra differentiation.
Final transformation for the London housing series: \(W_t=(1-B)log(serie)\)
The d1lnserie ACF is the following:
acf(d1lnserie,ylim=c(-1,1),lag.max=12*5,lwd=2,col=c(2,rep(1,11)), main ="ACF(d1d1lnserie)")
acf(d1lnserie,ylim=c(-1,1),lag.max=12*25,lwd=2,col=c(2,rep(1,11)), main ="ACF(d1lnserie)")
The ACF plot shows a clear spike at lag 12, but more precisely if you
see a recurring cycle every 12 months, we likely have a yearly
seasonality in the data.
The ACF appears to tail off slowly, rather than cutting off abruptly at one lag, which might indicate that the series has autoregressive (AR) behavior. We can see decreasing patterns (infinity lags not null). In this context, makes sense that the price of the house deppends on the past price.
And the PACF is the following:
pacf(d1lnserie,ylim=c(-1,1),col=c(rep(1,11),2),lwd=2,lag.max=12*5,main ="PACF(d1lnserie)")
pacf(d1lnserie,ylim=c(-1,1),col=c(rep(1,11),2),lwd=2,lag.max=12*25,main ="PACF(d1lnserie)")
MODELS - ARIMA(1,1,0)(1,0,1)[12] - ARIMA(1,1,1)(1,0,1)[12] -
With the plot we observed some characteristics of the time series, but we will check which is the best model numerically by trying to optimize the metrics: AIC (minimize), loglike (maximize) and BIC (minimize).
Manually we can see that :
model1 <- arima(lnserie, order = c(1, 1, 0), seasonal = list(order = c(1, 0, 1), period = 12))
summary(model1)
##
## Call:
## arima(x = lnserie, order = c(1, 1, 0), seasonal = list(order = c(1, 0, 1), period = 12))
##
## Coefficients:
## ar1 sar1 sma1
## 0.1715 0.9571 -0.7920
## s.e. 0.0535 0.0299 0.0754
##
## sigma^2 estimated as 0.0001214: log likelihood = 1096, aic = -2184
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.0008447912 0.01101892 0.00846293 0.007610119 0.06809117 0.803744
## ACF1
## Training set -0.06665244
model2 <- arima(lnserie, order = c(1, 1, 1), seasonal = list(order = c(1, 0, 1), period = 12))
summary(model2)
##
## Call:
## arima(x = lnserie, order = c(1, 1, 1), seasonal = list(order = c(1, 0, 1), period = 12))
##
## Coefficients:
## ar1 ma1 sar1 sma1
## 0.9119 -0.7297 0.9714 -0.8490
## s.e. 0.0317 0.0465 0.0237 0.0659
##
## sigma^2 estimated as 0.0001043: log likelihood = 1122.49, aic = -2234.99
##
## Training set error measures:
## ME RMSE MAE MPE MAPE
## Training set 0.0003577645 0.01021758 0.007783959 0.003337475 0.06256963
## MASE ACF1
## Training set 0.7392606 -0.1968962
model3 <- arima(lnserie, order = c(2,2, 1), seasonal = list(order = c(1, 0, 1), period = 12))
summary(model3)
##
## Call:
## arima(x = lnserie, order = c(2, 2, 1), seasonal = list(order = c(1, 0, 1), period = 12))
##
## Coefficients:
## ar1 ar2 ma1 sar1 sma1
## -0.5990 -0.2677 -0.4011 0.9880 -0.8849
## s.e. 0.0972 0.0796 0.0952 0.0105 0.0480
##
## sigma^2 estimated as 9.665e-05: log likelihood = 1129.93, aic = -2247.86
##
## Training set error measures:
## ME RMSE MAE MPE MAPE
## Training set -7.671904e-05 0.009839481 0.00759426 -0.0005212171 0.06098044
## MASE ACF1
## Training set 0.7212444 0.0005329013
Arima(lnserie, order = c(2,2, 1), seasonal = list(order = c(1, 0, 1), period = 12))
## Series: lnserie
## ARIMA(2,2,1)(1,0,1)[12]
##
## Coefficients:
## ar1 ar2 ma1 sar1 sma1
## -0.5990 -0.2677 -0.4011 0.9880 -0.8849
## s.e. 0.0972 0.0796 0.0952 0.0105 0.0480
##
## sigma^2 = 9.875e-05: log likelihood = 1129.93
## AIC=-2247.86 AICc=-2247.62 BIC=-2224.63
validation(model1)
##
## --------------------------------------------------------------------
##
## Call:
## arima(x = lnserie, order = c(1, 1, 0), seasonal = list(order = c(1, 0, 1), period = 12))
##
## Coefficients:
## ar1 sar1 sma1
## 0.1715 0.9571 -0.7920
## s.e. 0.0535 0.0299 0.0754
##
## sigma^2 estimated as 0.0001214: log likelihood = 1096, aic = -2184
##
## Modul of AR Characteristic polynomial Roots: 1.003659 1.003659 1.003659 1.003659 1.003659 1.003659 1.003659 1.003659 1.003659 1.003659 1.003659 1.003659 5.829318
##
## Modul of MA Characteristic polynomial Roots: 1.019624 1.019624 1.019624 1.019624 1.019624 1.019624 1.019624 1.019624 1.019624 1.019624 1.019624 1.019624
##
## Psi-weights (MA(inf))
##
## --------------------
## psi 1 psi 2 psi 3 psi 4 psi 5 psi 6
## 1.715467e-01 2.942826e-02 5.048319e-03 8.660222e-04 1.485632e-04 2.548552e-05
## psi 7 psi 8 psi 9 psi 10 psi 11 psi 12
## 4.371956e-06 7.499945e-07 1.286591e-07 2.207103e-08 3.786211e-09 1.651281e-01
## psi 13 psi 14 psi 15 psi 16 psi 17 psi 18
## 2.832717e-02 4.859432e-03 8.336193e-04 1.430046e-04 2.453196e-05 4.208376e-06
## psi 19 psi 20 psi 21 psi 22 psi 23 psi 24
## 7.219329e-07 1.238452e-07 2.124522e-08 3.644547e-09 6.252099e-10 1.580470e-01
##
## Pi-weights (AR(inf))
##
## --------------------
## pi 1 pi 2 pi 3 pi 4 pi 5 pi 6
## 0.17154666 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
## pi 7 pi 8 pi 9 pi 10 pi 11 pi 12
## 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.16512810
## pi 13 pi 14 pi 15 pi 16 pi 17 pi 18
## -0.02832717 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
## pi 19 pi 20 pi 21 pi 22 pi 23 pi 24
## 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.13077967
##
## Descriptive Statistics for the Residuals
##
## ----------------------------------------
## resi
## nobs 357.000000
## NAs 0.000000
## Minimum -0.041723
## Maximum 0.034892
## 1. Quartile -0.005863
## 3. Quartile 0.008048
## Mean 0.000845
## Median 0.000340
## Sum 0.301590
## SE Mean 0.000582
## LCL Mean -0.000300
## UCL Mean 0.001990
## Variance 0.000121
## Stdev 0.011002
## Skewness -0.209288
## Kurtosis 0.901107
##
## Normality Tests
##
## --------------------
##
## Shapiro-Wilk normality test
##
## data: resi
## W = 0.98634, p-value = 0.001902
##
##
## Anderson-Darling normality test
##
## data: resi
## A = 1.0426, p-value = 0.009554
##
##
## Jarque Bera Test
##
## data: resi
## X-squared = 15.302, df = 2, p-value = 0.0004755
##
##
## Homoscedasticity Test
##
## --------------------
##
## studentized Breusch-Pagan test
##
## data: resi ~ I(obs - resi)
## BP = 0.75303, df = 1, p-value = 0.3855
##
##
## Independence Tests
##
## --------------------
##
## Durbin-Watson test
##
## data: resi ~ I(1:length(resi))
## DW = 2.1873, p-value = 0.9574
## alternative hypothesis: true autocorrelation is greater than 0
##
##
## Ljung-Box test
## lag.df statistic p.value
## [1,] 1 1.599354 2.059947e-01
## [2,] 2 32.227461 1.004374e-07
## [3,] 3 63.791286 9.092727e-14
## [4,] 4 71.361795 1.165734e-14
## [5,] 12 107.585909 0.000000e+00
## [6,] 24 126.698875 6.661338e-16
## [7,] 36 148.578670 1.221245e-15
## [8,] 48 157.537675 1.385558e-13
validation(model2)
##
## --------------------------------------------------------------------
##
## Call:
## arima(x = lnserie, order = c(1, 1, 1), seasonal = list(order = c(1, 0, 1), period = 12))
##
## Coefficients:
## ar1 ma1 sar1 sma1
## 0.9119 -0.7297 0.9714 -0.8490
## s.e. 0.0317 0.0465 0.0237 0.0659
##
## sigma^2 estimated as 0.0001043: log likelihood = 1122.49, aic = -2234.99
##
## Modul of AR Characteristic polynomial Roots: 1.00242 1.00242 1.00242 1.00242 1.00242 1.00242 1.00242 1.00242 1.00242 1.00242 1.00242 1.00242 1.09656
##
## Modul of MA Characteristic polynomial Roots: 1.013737 1.013737 1.013737 1.013737 1.013737 1.013737 1.013737 1.013737 1.013737 1.013737 1.013737 1.013737 1.370444
##
## Psi-weights (MA(inf))
##
## --------------------
## psi 1 psi 2 psi 3 psi 4 psi 5 psi 6 psi 7
## 0.18225199 0.16620332 0.15156786 0.13822116 0.12604974 0.11495010 0.10482787
## psi 8 psi 9 psi 10 psi 11 psi 12 psi 13 psi 14
## 0.09559698 0.08717894 0.07950218 0.07250141 0.18855315 0.08260921 0.07533484
## psi 15 psi 16 psi 17 psi 18 psi 19 psi 20 psi 21
## 0.06870104 0.06265139 0.05713446 0.05210334 0.04751524 0.04333116 0.03951553
## psi 22 psi 23 psi 24
## 0.03603588 0.03286265 0.14890470
##
## Pi-weights (AR(inf))
##
## --------------------
## pi 1 pi 2 pi 3 pi 4 pi 5
## 0.1822519882 0.1329875311 0.0970397284 0.0708089609 0.0516686209
## pi 6 pi 7 pi 8 pi 9 pi 10
## 0.0377020981 0.0275108601 0.0200744112 0.0146481057 0.0106885825
## pi 11 pi 12 pi 13 pi 14 pi 15
## 0.0077993563 0.1281271535 -0.0181614587 -0.0132522426 -0.0096700345
## pi 16 pi 17 pi 18 pi 19 pi 20
## -0.0070561316 -0.0051487917 -0.0037570240 -0.0027414645 -0.0020004204
## pi 21 pi 22 pi 23 pi 24
## -0.0014596876 -0.0010651201 -0.0007772079 0.1033781501
##
## Descriptive Statistics for the Residuals
##
## ----------------------------------------
## resi
## nobs 357.000000
## NAs 0.000000
## Minimum -0.041512
## Maximum 0.033781
## 1. Quartile -0.005891
## 3. Quartile 0.006263
## Mean 0.000358
## Median -0.000314
## Sum 0.127722
## SE Mean 0.000541
## LCL Mean -0.000707
## UCL Mean 0.001422
## Variance 0.000105
## Stdev 0.010226
## Skewness -0.117473
## Kurtosis 1.033953
##
## Normality Tests
##
## --------------------
##
## Shapiro-Wilk normality test
##
## data: resi
## W = 0.98796, p-value = 0.004707
##
##
## Anderson-Darling normality test
##
## data: resi
## A = 1.006, p-value = 0.01176
##
##
## Jarque Bera Test
##
## data: resi
## X-squared = 17.436, df = 2, p-value = 0.0001636
##
##
## Homoscedasticity Test
##
## --------------------
##
## studentized Breusch-Pagan test
##
## data: resi ~ I(obs - resi)
## BP = 1.0598, df = 1, p-value = 0.3033
##
##
## Independence Tests
##
## --------------------
##
## Durbin-Watson test
##
## data: resi ~ I(1:length(resi))
## DW = 2.4081, p-value = 0.9999
## alternative hypothesis: true autocorrelation is greater than 0
##
##
## Ljung-Box test
## lag.df statistic p.value
## [1,] 1 13.95685 1.870550e-04
## [2,] 2 19.27533 6.522534e-05
## [3,] 3 31.29834 7.355588e-07
## [4,] 4 31.30517 2.652466e-06
## [5,] 12 51.06593 9.068729e-07
## [6,] 24 78.28989 1.136670e-07
## [7,] 36 90.10894 1.559193e-06
## [8,] 48 96.91819 3.730502e-05
validation(model3)
##
## --------------------------------------------------------------------
##
## Call:
## arima(x = lnserie, order = c(2, 2, 1), seasonal = list(order = c(1, 0, 1), period = 12))
##
## Coefficients:
## ar1 ar2 ma1 sar1 sma1
## -0.5990 -0.2677 -0.4011 0.9880 -0.8849
## s.e. 0.0972 0.0796 0.0952 0.0105 0.0480
##
## sigma^2 estimated as 9.665e-05: log likelihood = 1129.93, aic = -2247.86
##
## Modul of AR Characteristic polynomial Roots: 1.001006 1.001006 1.001006 1.001006 1.001006 1.001006 1.001006 1.001006 1.001006 1.001006 1.001006 1.001006 1.932893 1.932893
##
## Modul of MA Characteristic polynomial Roots: 1.010244 1.010244 1.010244 1.010244 1.010244 1.010244 1.010244 1.010244 1.010244 1.010244 1.010244 1.010244 2.493415
##
## Psi-weights (MA(inf))
##
## --------------------
## psi 1 psi 2 psi 3 psi 4 psi 5
## -1.0000498046 0.3313627955 0.0691896329 -0.1301368397 0.0594317800
## psi 6 psi 7 psi 8 psi 9 psi 10
## -0.0007667614 -0.0154482506 0.0094586316 -0.0015307725 -0.0016147788
## psi 11 psi 12 psi 13 psi 14 psi 15
## 0.0013769690 0.1027261388 -0.1032572631 0.0343546959 0.0070596473
## psi 16 psi 17 psi 18 psi 19 psi 20
## -0.0134240746 0.0061513437 -0.0000915207 -0.0015916510 0.0009778849
## psi 21 psi 22 psi 23 psi 24
## -0.0001597246 -0.0001660671 0.0001422251 0.1018409187
##
## Pi-weights (AR(inf))
##
## --------------------
## pi 1 pi 2 pi 3 pi 4 pi 5
## -1.000050e+00 -6.687368e-01 -2.682012e-01 -1.075638e-01 -4.313916e-02
## pi 6 pi 7 pi 8 pi 9 pi 10
## -1.730124e-02 -6.938771e-03 -2.782839e-03 -1.116075e-03 -4.476092e-04
## pi 11 pi 12 pi 13 pi 14 pi 15
## -1.795165e-04 1.030467e-01 1.030950e-01 6.894771e-02 2.765192e-02
## pi 16 pi 17 pi 18 pi 19 pi 20
## 1.108998e-02 4.447708e-03 1.783782e-03 7.153971e-04 2.869146e-04
## pi 21 pi 22 pi 23 pi 24
## 1.150689e-04 4.614913e-05 1.850841e-05 9.125561e-02
##
## Descriptive Statistics for the Residuals
##
## ----------------------------------------
## resi
## nobs 357.000000
## NAs 0.000000
## Minimum -0.038214
## Maximum 0.027894
## 1. Quartile -0.006283
## 3. Quartile 0.005414
## Mean -0.000077
## Median -0.000337
## Sum -0.027389
## SE Mean 0.000521
## LCL Mean -0.001102
## UCL Mean 0.000949
## Variance 0.000097
## Stdev 0.009853
## Skewness 0.095842
## Kurtosis 0.677857
##
## Normality Tests
##
## --------------------
##
## Shapiro-Wilk normality test
##
## data: resi
## W = 0.98943, p-value = 0.01108
##
##
## Anderson-Darling normality test
##
## data: resi
## A = 1.0988, p-value = 0.006943
##
##
## Jarque Bera Test
##
## data: resi
## X-squared = 7.8097, df = 2, p-value = 0.02014
##
##
## Homoscedasticity Test
##
## --------------------
##
## studentized Breusch-Pagan test
##
## data: resi ~ I(obs - resi)
## BP = 0.65351, df = 1, p-value = 0.4189
##
##
## Independence Tests
##
## --------------------
##
## Durbin-Watson test
##
## data: resi ~ I(1:length(resi))
## DW = 1.995, p-value = 0.4599
## alternative hypothesis: true autocorrelation is greater than 0
##
##
## Ljung-Box test
## lag.df statistic p.value
## [1,] 1 1.022366e-04 0.99193256
## [2,] 2 9.446917e-03 0.99528768
## [3,] 3 3.271435e-02 0.99844164
## [4,] 4 4.883515e-02 0.99970670
## [5,] 12 1.753563e+01 0.13053495
## [6,] 24 3.390122e+01 0.08643994
## [7,] 36 4.046930e+01 0.27952235
## [8,] 48 4.553930e+01 0.57423503
Expressions analysis: Causality and inveritbility: The model is both causal and invertible. This can be confirmed analytically by observing that the modulus of the AR characteristic polynomial roots and the modulus of the MA characteristic polynomial roots are greater than 1. Additionally, the model is graphically confirmed as causal and invertible, since the inverse roots of both the AR and MA polynomials lie inside the unit circle.
Causality and inveritbility: The model is both causal and invertible. Same reason as for model 1.
Causality and inveritbility: The model is both causal and invertible. Same reason as for model 1.
We have seen that the model with the best AIC, loglike and BIC is … Model. But to compare we will use RME, RSM forcasting error
## split train
train_data <- lnserie[1:(length(lnserie) - 12)]
test_data <- lnserie[(length(lnserie) - 11):length(lnserie)]
## model 1
model1 <- arima(train_data, order = c(1, 1, 0), seasonal = list(order = c(1, 0, 1), period = 12))
model2 <- arima(train_data, order = c(1, 1, 1), seasonal = list(order = c(1, 0, 1), period = 12))
model3 <- arima(train_data, order = c(2,2, 1), seasonal = list(order = c(1, 0, 1), period = 12))
forecast_values1 <- forecast(model1, h = 12)
forecast_values2 <- forecast(model2, h = 12)
forecast_values3 <- forecast(model3, h = 12)
accuracy(forecast_values1, test_data)
## ME RMSE MAE MPE MAPE
## Training set 0.001059155 0.01110578 0.008561819 0.00930855 0.06898541
## Test set -0.028730149 0.02996202 0.028730149 -0.21846636 0.21846636
## MASE ACF1
## Training set 0.8083084 -0.0720078
## Test set 2.7123698 NA
accuracy(forecast_values2, test_data)
## ME RMSE MAE MPE MAPE
## Training set 0.0004948748 0.01029705 0.007865779 0.004423214 0.06332901
## Test set -0.0259639947 0.02722785 0.025963995 -0.197437223 0.19743722
## MASE ACF1
## Training set 0.7425962 -0.2045885
## Test set 2.4512213 NA
accuracy(forecast_values3, test_data)
## ME RMSE MAE MPE MAPE
## Training set -3.448897e-05 0.009891862 0.007631939 -0.0001886737 0.0614081
## Test set -2.882083e-02 0.029704729 0.028820833 -0.2191253091 0.2191253
## MASE ACF1
## Training set 0.7205198 0.0004217666
## Test set 2.7209311 NA
model2 <- arima(lnserie, order = c(1, 1, 1), seasonal = list(order = c(1, 0, 1), period = 12))
best_model<-model2
forecast_results <- forecast(best_model, h = 12)
plot(forecast_results)
print(forecast_results)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Oct 2024 13.16481 13.15172 13.17790 13.14479 13.18483
## Nov 2024 13.16318 13.14291 13.18345 13.13218 13.19418
## Dec 2024 13.16513 13.13825 13.19201 13.12402 13.20624
## Jan 2025 13.16622 13.13293 13.19950 13.11531 13.21713
## Feb 2025 13.16197 13.12238 13.20157 13.10141 13.22253
## Mar 2025 13.16143 13.11559 13.20727 13.09132 13.23154
## Apr 2025 13.16531 13.11329 13.21734 13.08575 13.24488
## May 2025 13.16673 13.10858 13.22487 13.07780 13.25566
## Jun 2025 13.17611 13.11191 13.24032 13.07792 13.27430
## Jul 2025 13.18356 13.11337 13.25375 13.07621 13.29091
## Aug 2025 13.18717 13.11106 13.26327 13.07078 13.30356
## Sep 2025 13.18680 13.10486 13.26873 13.06149 13.31210
# Subset the last 5 years (80% window) of historical data
historical_data <- window(lnserie, start = time(lnserie)[length(time(lnserie)) - (5 * 12) + 1])
# Apply the exponential transformation to revert to the original scale
forecast_df <- data.frame(
Time = c(time(historical_data), time(forecast_results$mean)),
Value = c(exp(as.numeric(historical_data)), exp(as.numeric(forecast_results$mean))),
Type = c(rep("Historical", length(historical_data)), rep("Forecast", length(forecast_results$mean))),
Lower80 = c(rep(NA, length(historical_data)), exp(as.numeric(forecast_results$lower[, 1]))), # 80% CI lower
Upper80 = c(rep(NA, length(historical_data)), exp(as.numeric(forecast_results$upper[, 1]))), # 80% CI upper
Lower95 = c(rep(NA, length(historical_data)), exp(as.numeric(forecast_results$lower[, 2]))), # 95% CI lower
Upper95 = c(rep(NA, length(historical_data)), exp(as.numeric(forecast_results$upper[, 2]))) # 95% CI upper
)
library(ggplot2)
ggplot(forecast_df, aes(x = Time, y = Value, color = Type)) +
geom_line(size = 1) + # Line for historical and forecast data
# 80% confidence interval ribbon
geom_ribbon(data = subset(forecast_df, Type == "Forecast"),
aes(ymin = Lower80, ymax = Upper80), fill = "lightgreen", alpha = 0.3) +
# 95% confidence interval ribbon
geom_ribbon(data = subset(forecast_df, Type == "Forecast"),
aes(ymin = Lower95, ymax = Upper95), fill = "lightblue", alpha = 0.4) +
labs(title = "Time Series with 1-Year Forecast and Confidence Intervals",
x = "Time",
y = "Housing Price (Original Scale)",
color = "Data Type") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_x_continuous(breaks = seq(floor(min(forecast_df$Time)), ceiling(max(forecast_df$Time)), by = 1)) +
scale_color_manual(values = c("Historical" = "black", "Forecast" = "blue"))